167 research outputs found

    A simple deterministic algorithm for guaranteeing the forward progress of transactions

    Get PDF
    This paper describes a remarkably simple deterministic (not probabilistic) contention-management algorithm for guaranteeing the forward progress of transactions - avoiding deadlocks, livelocks, and other anomalies. The transactions must be finite (no infinite loops), but on each restart, a transaction may access different shared-memory locations. The algorithm supports irrevocable transactions as long as the transaction satisfies a simple ordering constraint. In particular, a transaction that accesses only one shared-memory location is never aborted. The algorithm is suitable for both hardware and software transactional-memory systems. It also can be used in some contexts as a locking protocol for implementing transactions "by hand."

    Randomized Routing on Fat-Trees

    Get PDF
    Fat-trees are a class of routing networks for hardware-efficient parallel computation. This paper presents a randomized algorithm for routing messages on a fat-tree. The quality of the algorithm is measured in terms of the load factor of a set of messages to be routed, which is a lower bound on the time required to deliver the messages. We show that if a set of messages has load factor lambda on a fat-tree with n processors, the number of delivery cycles (routing attempts) that the algorithm requires is O(lambda + lg n lg lg n) with probability 1-O(1/n). The best previous bound was O(lambda lg n) for the offline problem in which the set of messages is known in advance. In the context of a VLSI model that equates hardware cost with physical volume, the routing algorithm can be used to demonstrate that fat-trees are universal routing networks. Specifically, we prove that any routing network can be efficiently simulated by a fat-tree of comparable hardware cost

    Transactions Everywhere

    Get PDF
    Arguably, one of the biggest deterrants for software developers who might otherwise choose to write parallel code is that parallelism makes their lives more complicated. Perhaps the most basic problem inherent in the coordination of concurrent tasks is the enforcing of atomicity so that the partial results of one task do not inadvertently corrupt another task. Atomicity is typically enforced through locking protocols, but these protocols can introduce other complications, such as deadlock, unless restrictive methodologies in their use are adopted. We have recently begun a research project focusing on transactional memory [18] as an alternative mechanism for enforcing atomicity, since it allows the user to avoid many of the complications inherent in locking protocols. Rather than viewing transactions as infrequent occurrences in a program, as has generally been done in the past, we have adopted the point of view that all user code should execute in the context of some transaction. To make this viewpoint viable requires the development of two key technologies: effective hardware support for scalable transactional memory, and linguistic and compiler support. This paper describes our preliminary research results on making “transactions everywhere” a practical reality.Singapore-MIT Alliance (SMA

    Data-race detection in transactions-everywhere parallel programming

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.Includes bibliographical references (p. 69-72).This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.This thesis studies how to perform dynamic data-race detection in programs using "transactions everywhere", a new methodology for shared-memory parallel programming. Since the conventional definition of a data race does not make sense in the transactions-everywhere methodology, this thesis develops a new definition based on a weak assumption about the correctness of the target program's parallel-control flow, which is made in the same spirit as the assumption underlying the conventional definition. This thesis proves, via a reduction from the problem of 3cnf-formula satisfiability, that data-race detection in the transactions-everywhere methodology is an NP-complete problem. In view of this result, it presents an algorithm that approximately detects data races. The algorithm never reports false negatives. When a possible data race is detected, the algorithm outputs simple information that allows the programmer to efficiently resolve the root of the problem. The algorithm requires running time that is worst-case quadratic in the size of a graph representing all the scheduling constraints in the target program.by Kai Huang.M.Eng

    Efficient Out-of-Core Algorithms for Linear Relaxation Using Blocking Covers

    Get PDF
    AbstractWhen a numerical computation fails to fit in the primary memory of a serial or parallel computer, a so-called “out-of-core” algorithm, which moves data between primary and secondary memories, must be used. In this paper, we study out-of-core algorithms for sparse linear relaxation problems in which each iteration of the algorithm updates the state of every vertex in a graph with a linear combination of the states of its neighbors. We give a general method that can save substantially on the I/O traffic for many problems. For example, our technique allows a computer withMwords of primary memory to performT=Ω(M1/5) cycles of a multigrid algorithm for a two-dimensional elliptic solver over an n-point domain using onlyΘ(nT/M1/5) I/O transfers, as compared with the naive algorithm which requiresΩ(nT) I/O's. Our method depends on the existence of a “blocking” cover of the graph that underlies the linear relaxation. A blocking cover has the property that the subgraphs forming the cover have large diameters once a small number of vertices have been removed. The key idea in our method is to introduce a variable for each removed vertex for each time step of the algorithm. We maintain linear dependences among the removed vertices, thereby allowing each subgraph to be iteratively relaxed without external communication. We give a general theorem relating blocking covers to I/O-efficient relaxation schemes. We also give an automatic method for finding blocking covers for certain classes of graphs, including planar graphs andd-dimensional simplicial graphs with constant aspect ratio (i.e., graphs that arise from dividingd-space into “well-shaped” polyhedra). As a result, we can performTiterations of linear relaxation on anyn-vertex planar graph using onlyΘ(n+nTlgn/M1/4) I/O's or on anyn-noded-dimensional simplicial graph with constant aspect ratio using onlyΘ(n+nTlgn/MΩ(1/d)) I/O's

    Provably Efficient Adaptive Scheduling for Parallel Jobs

    Get PDF
    Scheduling competing jobs on multiprocessors has always been an important issue for parallel and distributed systems. The challenge is to ensure global, system-wide efficiency while offering a level of fairness to user jobs. Various degrees of successes have been achieved over the years. However, few existing schemes address both efficiency and fairness over a wide range of work loads. Moreover, in order to obtain analytical results, most of them require prior information about jobs, which may be difficult to obtain in real applications. This paper presents two novel adaptive scheduling algorithms -- GRAD for centralized scheduling, and WRAD for distributed scheduling. Both GRAD and WRAD ensure fair allocation under all levels of workload, and they offer provable efficiency without requiring prior information of job's parallelism. Moreover, they provide effective control over the scheduling overhead and ensure efficient utilization of processors. To the best of our knowledge, they are the first non-clairvoyant scheduling algorithms that offer such guarantees. We also believe that our new approach of resource request-allotment protocol deserves further exploration. Specifically, both GRAD and WRAD are O(1)-competitive with respect to mean response time for batched jobs, and O(1)-competitive with respect to makespan for non-batched jobs with arbitrary release times. The simulation results show that, for non-batched jobs, the makespan produced by GRAD is no more than 1.39 times of the optimal on average and it never exceeds 4.5 times. For batched jobs, the mean response time produced by GRAD is no more than 2.37 times of the optimal on average, and it never exceeds 5.5 times.Singapore-MIT Alliance (SMA

    The Cilkview scalability analyzer

    Full text link
    The Cilkview scalability analyzer is a software tool for profiling, estimating scalability, and benchmarking multithreaded Cilk++ ap-plications. Cilkview monitors logical parallelism during an instru-mented execution of the Cilk++ application on a single process-ing core. As Cilkview executes, it analyzes logical dependencies within the computation to determine its work and span (critical-path length). These metrics allow Cilkview to estimate parallelism and predict how the application will scale with the number of pro-cessing cores. In addition, Cilkview analyzes scheduling overhead using the concept of a “burdened dag, ” which allows it to diagnose performance problems in the application due to an insufficient grain size of parallel subcomputations. Cilkview employs the Pin dynamic-instrumentation framework to collect metrics during a serial execution of the application code. It operates directly on the optimized code rather than on a debug version. Metadata embedded by the Cilk++ compiler in the binary executable identifies the parallel control constructs in the executing application. This approach introduces little or no overhead to the program binary in normal runs. Cilkview can perform real-time scalability benchmarking auto-matically, producing gnuplot-compatible output that allows devel-opers to compare an application’s performance with the tool’s pre-dictions. If the program performs beneath the range of expectation, the programmer can be confident in seeking a cause such as insuf-ficient memory bandwidth, false sharing, or contention, rather than inadequate parallelism or insufficient grain size

    A Media Player for Use in Distance Education

    Get PDF
    We have developed a media player for use in distance education. The player can incorporate several time-indexed sources, including video, audio, PowerPoint, and text index. We have converted all the SMA 5503 Introduction to Algorithms lecture videos for use with the player. Student response to the player has been excellent. The media player's graphical user interface (see Fig. 1) permits a student viewer to navigate a video by the text index. The text index can be produced either automatically by a PowerPoint plug-in developed by David Mycue of MIT's Academic Media Production Systems, or manually by a teaching assistant using an ordinary text editor. The student can flip easily through PowerPoint slides, invoking the video only when needed, rather than being forced to watch video presentation of material the student already understands. The player also allows playback to be accelerated up to twice real-time speed without changing the speaker's tone, thereby allowing the student to watch a video in much less time that would otherwise be possible. A single-click button allows the student to skip backward in the video a few seconds, easing the replay of critical material. Another single-click button allows the user to skip forward. Although none of the features in this media player are unique or original, their combination provides a platform for lecture viewing, superior to the one currently being used by SMA. Ongoing work on the player includes adding more features to the player, such as allowing students to add multiple streams of other information (e.g., notes, snapshots of the lecture, etc.) which will be synchronized with the video in the same manner as the text indices and the slides in the current version.Singapore-MIT Alliance (SMA

    Programming with Exceptions in JCilk

    Get PDF
    JCilk extends the Java language to provide call-return semantics for multithreading, much as Cilk does for C. Java's built-in thread model does not support the passing of exceptions or return values from one thread back to the "parent" thread that created it. JCilk imports Cilk's fork-join primitives spawn and sync into Java to provide procedure-call semantics for concurrent subcomputations. This paper shows how JCilk integrates exception handling with multithreading by defining semantics consistent with the existing semantics of Java's try and catch constructs, but which handle concurrency in spawned methods. JCilk's strategy of integrating multithreading with Java's exception semantics yields some surprising semantic synergies. In particular, JCilk extends Java's exception semantics to allow exceptions to be passed from a spawned method to its parent in a natural way that obviates the need for Cilk's inlet and abort constructs. This extension is "faithful" in that it obeys Java's ordinary serial semantics when executed on a single processor. When executed in parallel, however, an exception thrown by a JCilk computation signals its sibling computations to abort, which yields a clean semantics in which only a single exception from the enclosing try block is handled. The decision to implicitly abort side computations opens a Pandora's box of subsidiary linguistic problems to be resolved, however. For instance, aborting might cause a computation to be interrupted asynchronously, causing havoc in programmer understanding of code behavior. To minimize the complexity of reasoning about aborts, JCilk signals them "semisynchronously" so that abort signals do not interrupt ordinary serial code. In addition, JCilk propagates an abort signal throughout a subcomputation naturally with a built-in CilkAbort exception, thereby allowing programmers to handle clean-up by simply catching the CilkAbort exception. The semantics of JCilk allow programs with speculative computations to be programmed easily. Speculation is essential for parallelizing programs such as branch-and-bound or heuristic search. We show how JCilk's linguistic mechanisms can be used to program a solution to the "queens" problem and an implemention of a parallel alpha-beta search.Singapore-MIT Alliance (SMA
    • 

    corecore